Tutorial for gene panel apps utilization

In this tutorial, we will demonstrate four apps that exerted the most potentials of geometric deep-learning results. i.e. the ‘L’ stage in DMLA cycle. Here, we developed thoes apps to unlock the natural potentials that GNNs have learned from the domain-knowledge integrated datasets.

DEMO1.1: Yellow light induced optogenetic denitrifier based on metatranscriptomics

APP1: 2D-Visualization of gene panels (cluster)

basic_var() can obtain the basic variable list, including: 1. label_path, 2. output_path, 3. graphdata.path, 4. fig.path, 5. kegg_anno.path, 6. Sub_Swiss.path,7. graphdata, 8. labels, 9. kegg_anno, Sub_Swiss, 10. matched_swiss, 11. matched_kegg

Clustering() projected the clustering results of DEGs in the 2D latent space to picture the gene panel.

# setting parameters
rm(list=ls()) 
source('./scr/R/AppFunctions.R')

dataset = 'LY_9samples_metatranscriptomics' # dataset name
group = 'Yellow_vs_Dark'
SubDataset <- 'YvsD_SubCell_dgi'

var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Yellow_vs_Darklabeled.csv exists."
p <- Clustering(dataset, group, SubDataset)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Yellow_vs_Darklabeled.csv exists."
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.1     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## 
## Attaching package: 'cowplot'
## 
## 
## The following object is masked from 'package:lubridate':
## 
##     stamp
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
p

Annotate the gene panel (clusters) or interested genes and pathways through function Annotation(). Before annotation, set your interested annotations type by define GeneSet_bycluster or GeneSet_byswiss or GeneSet_kegg. Then, Annotation() will save the results of annotation in local files, which is the annotation searching results that could be utilized for locating genes/clusters in the 2D latent space.

The output are all unique genes. Note that some genes without annotation will not be taken account.

# --- load functional genes ---
source('./scr/R/AppFunctions.R')
library(dplyr)

# setting parameters
dataset = 'LY_9samples_metatranscriptomics'
group = 'Yellow_vs_Dark'
SubDataset <- 'YvsD_SubCell_dgi'

var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Yellow_vs_Darklabeled.csv exists."
GeneSet_bycluster <- list('Cluster 1', 'Cluster 2','Cluster 3', 'Cluster 4', 'Cluster 5', 'Cluster 6', 'Cluster 7') # annotate gene panel information of gemes 
GeneSet_byswiss <- list(c('nitrate', 'nitrite')) # annotate nitrate and nitrite related genes based on Swiss-Prot database.
GeneSet_bykegg <- list(c('Phototransduction')) # annotate interested pathways, here the phototransuction pathway, based on KEGG database.

# Annotation(dataset, group, SubDataset, GeneSet_bykegg,
#            GeneSet_byswiss=0, 
#            GeneSet_bycluster=0)
Annotation(dataset, group, SubDataset, GeneSet_bykegg=0,
           GeneSet_byswiss=0, 
           GeneSet_bycluster)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Yellow_vs_Darklabeled.csv exists."
## [1] "The total gene number of Cluster 1 is: 512"
## [1] "The total gene number of Cluster 2 is: 353"
## [1] "The total gene number of Cluster 3 is: 26"
## [1] "The total gene number of Cluster 4 is: 120"
## [1] "The total gene number of Cluster 5 is: 783"
## [1] "The total gene number of Cluster 6 is: 80"
## [1] "The total gene number of Cluster 7 is: 131"

APP2: Gene panel enrichment analysis to unlock unknown potentials of microbiomes

This is a powerful app at ‘L’ stage of DMLA cycle that can mine the tremendous potential of nature based on genetic codes, i.e. meta-omic data.

Here, we will conduct pathways enrichment analysis to predict the collective behaviors of microbiota, as well as some unknown phenotype for versatile purposes, such as bio-production, pollutant removal and resources recovery.

You need to:

  1. *Choose your interested Cluster and subdataset
    • set the parameter: dataset, group, SubDataset and Cluster.
  2. *Create group information of subdataset and set the parameters
    • For targeted subdataset, e.g. group_Yellow, please create a new group file (e.g. group_Yellow.csv) at ‘inputdata’ based on group.csv, keeping only reference group and experimental group.
  3. *Set threshod for filtering

Finally, the cell will visualizes gene panel enrichment analysis.

# --- setting parameters ---
# *dataset
source('./scr/R/AppFunctions.R')
dataset = 'LY_9samples_metatranscriptomics'
group = 'Yellow_vs_Dark'
SubDataset <- 'YvsD_SubCell_dgi'
Cluster = 'Cluster 3' # set the cluster for enrichment analysis

# *set basic data for p-value and other statistic parameters calculations
ref_group <- 'Dark' # reference group, i.e. the name of control group
group_sample <- 'group_Yellow' 
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
annotation.level <- 'level3_pathway_name'
# === pvalue and statistic analysis ===
rawdata <- obtain_pathdata(dataset, group, SubDataset, 
                            Cluster)

# setting for visualization app, the plot will be saved at 'output_data/Figures/' with the name of fig.name
fig.name <- paste(Cluster, ' of ', group_sample, sep = '')
  
# --- obtain expression level by pathway ---
library(dplyr)
pathway <- rawdata %>% 
  group_by(level3_pathway_name) %>%
  summarise(Dark1mean = mean(Dark1), 
            Dark2mean = mean(Dark2),
            Dark3mean = mean(Dark3),
            Yellow1mean = mean(Yellow1), 
            Yellow2mean = mean(Yellow2),
            Yellow3mean = mean(Yellow3)) 

# --- obtain p-value ---
df.pathlevel_p <- obtain_pvalue(dataset,ref_group, group_sample.data,  pathway, annotation.level)
## 
## Attaching package: 'ggpubr'
## The following object is masked from 'package:cowplot':
## 
##     get_legend
## Warning in apply(temp[, 1:(length(temp))], 2, as.numeric): NAs introduced by
## coercion
# --- obtain other statistic parameters ---
df.pathlevel_p <- ObtainStatistic(pathway, df.pathlevel_p, group_sample.data)

# *set threshod for filtering
p_thre <- 0.2 # threshold of p-value, i.e. significance level 
FC_up <- 2 # upper limit of fold change
FC_down <- 0.9 # lower limit of fold change
Expre <- 10 # expression level/abundance

# --- conduct enrichment analysis ---
fig <- enrich(df.pathlevel_p, SubDataset, dataset, Cluster,
                   p_thre, FC_up, FC_down, Expre,
                   group_sample.data, annotation.level)
## Warning: Using one column matrices in `filter()` was deprecated in dplyr 1.1.0.
## ℹ Please use one dimensional logical vectors instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## [1] "File ./data/LY_9samples_metatranscriptomics/output_data/R/YvsD_SubCell_dgi_Cluster 3_Enrichment.csv Exist."
## Warning: ggrepel: 21 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
fig
## Warning: ggrepel: 28 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

APP3: Gene Panel comparison

To conduct gene panels comparison, enrichment analysis of targeted panel is a prerequisite. Please make sure to have used APP2 for enrichment analysis with proper threshold.

Based on the comparison, we can identify the role of gene panels in the whole gene regulatory network, such as Hub Gene Panels (HGPs) and Signaling Gene Panels (SGPs).

  • HGPs: Genes with high expression, usually corresponding to the center metabolism or phenotype of microbiomes. (e.g. Cluster 3 in the following case)

  • SGPs: Genes with low expression but strikingly high fold change. (e.g. Cluster 6 and Cluster 5 in the following case)

In default, all the clusters will be compared.

# setting parameters
source('./scr/R/AppFunctions.R')
dataset = 'LY_9samples_metatranscriptomics'
group = 'Yellow_vs_Dark'
SubDataset <- 'YvsD_SubCell_dgi'

p <- GPsEnrichCompare(dataset, group, SubDataset, k=7) 
## Warning: ggrepel: 130 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
p
## Warning: ggrepel: 136 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

APP4: Topology Network with landmark genes for developing regulatory strategy

This is an incredible and powerful apps that can help develop biotechnology to regulate microbial community.

From app3, we can identify the Hub Gene Panels (HGPs) and Signaling Gene Panels (HGPs). To better regulate the microbiome, we construct regulatory network, also called interaction network, on the genes level based on microbial genetic topology.

Cross-platform usages: The occor.r and graph_list can also be utilized for visualization in Gephi or R.

The output results is landmark genes list. The top 5 landmark genes’ name will be printed, which can be utilized to develop regulatory strategy on the microbiota. Detail descriptions of landmark genes are available in variable: landmark.ordered.

# setting parameters
source('./scr/R/AppFunctions.R')
dataset <- 'LY_9samples_metatranscriptomics'
group <- 'Yellow_vs_Dark'
SubDataset <- 'YvsD_SubCell_dgi'
# *set basic data for statistic parameters calculations
ref_group <- 'Dark'
group_sample <- 'group_Yellow'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
# *set mode of selecting cluster for topological analysis, i.e. mannually set cluster or set the targeted genes first, then search which cluster the genes subjected to. 
Mode <- 'Manu' # another mode is: 'Manu'

# --- gene panel setting ---
# *setting for gene searching: Obtain the targeted genes and corresponding gene panel

if (Mode == 'Search'){
  # method 1: search the gene panel that interested gene or pathway assigned to 
  interested.gene <- 'nitrite'
  interested.pathway <- 'Phototransduction'
  SearchGene <- FALSE
  SearchPathway <- TRUE
  # option: level3_pathway_name, ko_des, SwissProt_Description
  Cluster <- Search_GePa(var_list, SearchGene, SearchPathway)
} else if (Mode == 'Manu'){
  # method 2: manually set interested gene panel (cluster)
  Cluster <- 'Cluster 3' # HGP
}

# === obtain correlation matrix: occor.r ===
occor.r <- ObtainCorNet(dataset, SubDataset, Cluster, group, group_sample.data, gene_name=FALSE, thre.p = 0.05, thre.r = 0.8)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
## [1] "File ./data/LY_9samples_metatranscriptomics/output_data/R/YvsD_SubCell_dgi_Cluster 3_p0.05_r0.8_CorNet.csv exists."
library(igraph)
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:lubridate':
## 
##     %--%, union
## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union
## The following objects are masked from 'package:purrr':
## 
##     compose, simplify
## The following object is masked from 'package:tidyr':
## 
##     crossing
## The following object is masked from 'package:tibble':
## 
##     as_data_frame
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
# parameter setting*
ExploreDegree = FALSE # whether or not visualize the exploratory analysis on nodes degree
ExploreNeibor = FALSE # whether or not visualize the exploratory analysis on nodes neiboring information 
adjacency_weight <- occor.r
igraph <- graph_from_adjacency_matrix(as.matrix(adjacency_weight), mode = 'undirected', weighted = TRUE, diag = FALSE)

# === obtain graph: graph_list ===
graph_list <- ObtainGraph(dataset, SubDataset, Cluster, group, group_sample.data, igraph, gene_name=TRUE, thre.p = 0.05, thre.r = 0.8)
## [1] "The total nodes number is: 25"
# --- obtain landmark genes ----
landmark.ordered <- obtain_landmark(graph_list, dataset, group, SubDataset,
                            Cluster, group_sample.data)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Yellow_vs_Darklabeled.csv exists."
landmark.top5 <- landmark.ordered$ko_name[1:5]

print(paste('The top 5 landmark genes of', Cluster, 'is:'))
## [1] "The top 5 landmark genes of Cluster 3 is:"
print(landmark.top5)
## [1] "dnaK, HSPA9"  "lon"          "rlmI"         "HSP90A, htpG" "mutS"

The output of above cell is as follows. it can be known that thses landmark genes related to protein synthesis, including molecular chaperone (“dnaK”, “HSP90A, htpG”), DNA mismatch repair protein and protease, indicating that yellow light regulated the metabolic flux to protein synthesis. This was validated in our wet experiments, i.e. the significantly enhanced extracellular proteins.

[1] "The total nodes number is: 25"
[1] "The top 5 landmark genes of Cluster 3 is:"
[1] "dnaK, HSPA9"  "lon"          "rlmI"         "HSP90A, htpG" "mutS" 

DEMO1.2: Blue light induced optogenetic denitrifier based on metatranscriptomics

APP1: 2D-Visualization of gene panels (cluster)

# setting parameters
rm(list=ls()) 
source('./scr/R/AppFunctions.R')
dataset = 'LY_9samples_metatranscriptomics'
group = 'Blue_vs_Dark'
SubDataset <- 'BvsD_SubCell_dgi'

var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
p <- Clustering(dataset, group, SubDataset)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
p

# --- load functional genes ---
library(dplyr)
source('./scr/R/AppFunctions.R')
# setting parameters
dataset = 'LY_9samples_metatranscriptomics'
group = 'Blue_vs_Dark'
SubDataset <- 'BvsD_SubCell_dgi'

var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
GeneSet_bycluster <- list('Cluster 1', 'Cluster 2','Cluster 3', 'Cluster 4', 'Cluster 5', 'Cluster 6', 'Cluster 7')
GeneSet_byswiss <- list(c('nitrate', 'nitrite'))
GeneSet_bykegg <- list(c('Phototransduction'))

Annotation(dataset, group, SubDataset, GeneSet_bykegg=0,
           GeneSet_byswiss=0, 
           GeneSet_bycluster) 
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
## [1] "The total gene number of Cluster 1 is: 1838"
## [1] "The total gene number of Cluster 2 is: 767"
## [1] "The total gene number of Cluster 3 is: 4180"
## [1] "The total gene number of Cluster 4 is: 394"
## [1] "The total gene number of Cluster 5 is: 2105"
## [1] "The total gene number of Cluster 6 is: 899"
## [1] "The total gene number of Cluster 7 is: 744"

APP2: Panel enrichment analysis

# rm(list=ls()) 
# --- setting parameters ---
# *dataset
source('./scr/R/AppFunctions.R')
dataset = 'LY_9samples_metatranscriptomics'
group = 'Blue_vs_Dark'
SubDataset <- 'BvsD_SubCell_dgi'
Cluster = 'Cluster 4' 

# *set basic data for p-value and other statistic parameters calculations
ref_group <- 'Dark'
group_sample <- 'group_Blue'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
annotation.level <- 'level3_pathway_name'
# === pvalue and statistic analysis ===
rawdata <- obtain_pathdata(dataset, group, SubDataset, 
                            Cluster)

# *setting for visualization app
fig.name <- paste(Cluster, ' of ', group_sample, sep = '')
  
# --- obtain expression level by pathway ---
library(dplyr)
pathway <- rawdata %>% 
  group_by(level3_pathway_name) %>%
  summarise(Dark1mean = mean(Dark1), 
            Dark2mean = mean(Dark2),
            Dark3mean = mean(Dark3),
            Blue1mean = mean(Blue1), 
            Blue2mean = mean(Blue2),
            Blue3mean = mean(Blue3)) 

# --- obtain p-value ---
df.pathlevel_p <- obtain_pvalue(dataset,ref_group, group_sample.data,  pathway, annotation.level)
## Warning in apply(temp[, 1:(length(temp))], 2, as.numeric): NAs introduced by
## coercion
# --- obtain other statistic parameters ---
df.pathlevel_p <- ObtainStatistic(pathway, df.pathlevel_p, group_sample.data)

# *set threshod for filtering
if (Cluster == 'Cluster 4'){
  p_thre <- 0.05
  FC_up <- 2
  FC_down <- 0.2
  Expre <- 20
} else if (Cluster == 'Cluster 5'){
  p_thre <- 0.001 
  FC_up <- 10
  FC_down <- 0.5
  Expre <- 10
} else if (Cluster == 'Cluster 7'){
  p_thre <- 0.01
  FC_up <- 2
  FC_down <- 0.5
  Expre <- 10
} else if (Cluster == 'Cluster 2'){
  p_thre <- 0.001
  FC_up <- 2
  FC_down <- 0.5
  Expre <- 10
} else if (Cluster == 'Cluster 3'){
  p_thre <- 0.01
  FC_up <- 2
  FC_down <- 0.5
  Expre <- 5.5
} else if (Cluster == 'Cluster 1'){
  p_thre <- 0.01
  FC_up <- 2
  FC_down <- 0.5
  Expre <- 1.5
} else if (Cluster == 'Cluster 6'){
  p_thre <- 0.01
  FC_up <- 2
  FC_down <- 0.5
  Expre <- 10
}


# --- conduct enrichment analysis ---
fig <- enrich(df.pathlevel_p, SubDataset, dataset, Cluster,
                   p_thre, FC_up, FC_down, Expre,
                   group_sample.data, annotation.level)
## [1] "File ./data/LY_9samples_metatranscriptomics/output_data/R/BvsD_SubCell_dgi_Cluster 4_Enrichment.csv Exist."
## Warning: ggrepel: 2 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
fig
## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

APP3: Gene Panel comparison

# setting parameters
source('./scr/R/AppFunctions.R')
dataset = 'LY_9samples_metatranscriptomics'
group = 'Blue_vs_Dark'
SubDataset <- 'BvsD_SubCell_dgi'

p <- GPsEnrichCompare(dataset, group, SubDataset, k=7) 
## Warning: ggrepel: 100 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
p
## Warning: ggrepel: 101 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

APP4: Topology Network with landmark genes regulatory strategy

# setting parameters
source('./scr/R/AppFunctions.R')
dataset = 'LY_9samples_metatranscriptomics'
group = 'Blue_vs_Dark'
SubDataset <- 'BvsD_SubCell_dgi'
# *set basic data for statistic parameters calculations
ref_group <- 'Dark'
group_sample <- 'group_Blue'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
# *set mode of selecting cluster for topological analysis, i.e. mannually set cluster or set the targeted genes first, then search which cluster the genes subjected to. 
Mode <- 'Manu' # another mode is: 'Manu'

# --- gene panel setting ---
# *setting for gene searching: Obtain the targeted genes and corresponding gene panel

if (Mode == 'Search'){
  # method 1: search the gene panel that interested gene or pathway assigned to 
  interested.gene <- 'nitrate'
  interested.pathway <- 'Phototransduction'
  SearchGene <- TRUE
  SearchPathway <- FALSE
  # option: level3_pathway_name, ko_des, SwissProt_Description
  Cluster <- Search_GePa(var_list, SearchGene=TRUE, SearchPathway=FALSE)
} else if (Mode == 'Manu'){
  # method 2: manually set interested gene panel (cluster)
  Cluster <- 'Cluster 4' # HGP
}

# === obtain correlation matrix: occor.r ===
occor.r <- ObtainCorNet(dataset, SubDataset, Cluster, group, group_sample.data, gene_name=FALSE, thre.p = 0.05, thre.r = 0.8)
## [1] "File ./data/LY_9samples_metatranscriptomics/output_data/R/BvsD_SubCell_dgi_Cluster 4_p0.05_r0.8_CorNet.csv exists."
library(igraph)
# parameter setting*
ExploreDegree = FALSE # whether or not visualize the exploratory analysis on nodes degree
ExploreNeibor = FALSE # whether or not visualize the exploratory analysis on nodes neiboring information 
adjacency_weight <- occor.r
igraph <- graph_from_adjacency_matrix(as.matrix(adjacency_weight), mode = 'undirected', weighted = TRUE, diag = FALSE)

# === obtain graph: graph_list ===
graph_list <- ObtainGraph(dataset, SubDataset, Cluster, group, group_sample.data, igraph, gene_name=TRUE, thre.p = 0.05, thre.r = 0.8)
## [1] "The total nodes number is: 85"
# --- obtain landmark genes ----
landmark.ordered <- obtain_landmark(graph_list, dataset, group, SubDataset,
                            Cluster, group_sample.data)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
landmark.top5 <- landmark.ordered$ko_name[1:5]

print(paste('The top 5 landmark genes of', Cluster, 'is:'))
## [1] "The top 5 landmark genes of Cluster 4 is:"
print(landmark.top5)
## [1] "fliC, hag" "dapA"      "SOD2"      "rlmI"      "fliC, hag"

The output of above cell is as follows.

## for cluster 3:
[1] "The total nodes number is: 85"
[1] "The top 5 landmark genes of Cluster 4 is:"
[1] "fliC, hag" "dapA"      "SOD2"      "rlmI"      "fliC, hag"

# for phototransduction: 
[1] "Most of  Phototransduction related gene belong to:"
Cluster 5 
       13 
[1] "The cluster distribution of Phototransduction is:"

Cluster 3 Cluster 5 
        5        13 
[1] "File ./data/LY_9samples_metatranscriptomics/output_data/R/BvsD_SubCell_dgi_Cluster 5_p0.05_r0.8_CorNet.csv exists."
[1] "The total nodes number is: 204"
[1] "The top 5 landmark genes of Cluster 5 is:"
[1] "hemH, FECH" "glf"        "E1.11.1.5"  "E1.11.1.19" "cfa"   

For cluster 3, the HGP of blue light and also nitrate reductase assigned to, superoxide dismutase (SOD) and 4-hydroxy-tetrahydrodipicolinate synthase (DapA) are the landmark genes. Both of them are critical enzymes in antioxidant systems, mainly involving in superoxide scavenging. We also validated this in the wet experiments as mentioned in our manuscript.

For phototransduction, it can be observed that hemH/FECH, the heme is predicted to be effective in regulate phototransduction, i.e. photo-electron signaling transmission process (see manuscript for detailed wet- and dry-lab demenstration, or has been reported by other studies).

DEMO2: Blue light induced CO2 fixation based on metatranscriptomics

APP1: 2D-Visualization of gene panels (cluster)

# --- load functional genes ---
library(dplyr)
source('./scr/R/AppFunctions.R')
# setting parameters
dataset = 'ZJ_12samples_metatranscriptomics'
group = 'Blue_vs_White'
SubDataset <- 'Blue_vs_White'

var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/ZJ_12samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Whitelabeled.csv exists."
p <- Clustering(dataset, group, SubDataset, 
                 reduction = 'umap')
## [1] "File ./data/ZJ_12samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Whitelabeled.csv exists."
p

APP2: Panel enrichment analysis

library(dplyr)
source('./scr/R/AppFunctions.R')
# setting parameters
dataset = 'ZJ_12samples_metatranscriptomics'
group = 'Blue_vs_White'
SubDataset <- 'Blue_vs_White'

var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/ZJ_12samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Whitelabeled.csv exists."
GeneSet_bycluster <- list('Cluster 1', 'Cluster 2','Cluster 3', 'Cluster 4', 'Cluster 5', 'Cluster 6', 'Cluster 7') # annotate gene panel information of gemes 
GeneSet_byswiss <- list(c('nitrate', 'nitrite')) # annotate nitrate and nitrite related genes based on Swiss-Prot database.
GeneSet_bykegg <- list(c('Phototransduction')) # annotate interested pathways, here the phototransuction pathway, based on KEGG database.

# Annotation(dataset, group, SubDataset, GeneSet_bykegg,
#            GeneSet_byswiss=0, 
#            GeneSet_bycluster=0)
Annotation(dataset, group, SubDataset, GeneSet_bykegg=0,
           GeneSet_byswiss=0, 
           GeneSet_bycluster)
## [1] "File ./data/ZJ_12samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Whitelabeled.csv exists."
## [1] "The total gene number of Cluster 1 is: 418"
## [1] "The total gene number of Cluster 2 is: 1881"
## [1] "The total gene number of Cluster 3 is: 1757"
## [1] "The total gene number of Cluster 4 is: 828"
## [1] "The total gene number of Cluster 5 is: 889"
## [1] "The total gene number of Cluster 6 is: 2570"
## [1] "The total gene number of Cluster 7 is: 3434"
[1] "The total gene number of Cluster 1 is: 418"
[1] "The total gene number of Cluster 2 is: 1881"
[1] "The total gene number of Cluster 3 is: 1757"
[1] "The total gene number of Cluster 4 is: 828"
[1] "The total gene number of Cluster 5 is: 889"
[1] "The total gene number of Cluster 6 is: 2570"
[1] "The total gene number of Cluster 7 is: 3434"

‘annotation.level’ is the parameter that you can set to your interested annotation database, such as KEGG pathways, enzymes, Swissprots, etc. In this case demonstration, we utilized gene definition as annotation to conduct enrichment analysis.

# rm(list=ls()) 
# --- setting parameters ---
# *dataset
source('./scr/R/AppFunctions.R')
dataset = 'ZJ_12samples_metatranscriptomics'
group = 'Blue_vs_White'
SubDataset <- 'Blue_vs_White'
Cluster = 'Cluster 3' 

# *set basic data for p-value and other statistic parameters calculations
ref_group <- 'White'
group_sample <- 'group_Blue'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
annotation.level <- 'Definition'
# === pvalue and statistic analysis ===
rawdata <- obtain_pathdata(dataset, group, SubDataset, 
                            Cluster)

# *setting for visualization app
fig.name <- paste(Cluster, ' of ', group_sample, sep = '')
  
# --- obtain expression level by pathway ---
library(dplyr)
pathway <- rawdata %>%
  group_by(Definition) %>%
  summarise(White1mean = mean(W_1),
            White2mean = mean(W_2),
            White3mean = mean(W_3),
            Blue1mean = mean(B_1),
            Blue2mean = mean(B_2),
            Blue3mean = mean(B_3))

# --- obtain p-value ---
df.pathlevel_p <- obtain_pvalue(dataset, ref_group, group_sample.data,  pathway, annotation.level)
## Warning in apply(temp[, 1:(length(temp))], 2, as.numeric): NAs introduced by
## coercion
# --- obtain other statistic parameters ---
df.pathlevel_p <- ObtainStatistic(pathway, df.pathlevel_p, group_sample.data)

# *set threshod for filtering
if (Cluster == 'Cluster 4'){
  p_thre <- 0.05
  FC_up <- 2
  FC_down <- 0.5
  Expre <- 20
} else if (Cluster == 'Cluster 5'){
  p_thre <- 0.001
  FC_up <- 5
  FC_down <- 0.5
  Expre <- 20
} else if (Cluster == 'Cluster 7'){
  p_thre <- 0.001
  FC_up <- 5
  FC_down <- 0.5
  Expre <- 20
} else if (Cluster == 'Cluster 2'){
  p_thre <- 0.005
  FC_up <- 5
  FC_down <- 0.4
  Expre <- 20
} else if (Cluster == 'Cluster 3'){
  p_thre <- 0.05
  FC_up <- 10
  FC_down <- 0.5
  Expre <- 20
} else if (Cluster == 'Cluster 1'){
  p_thre <- 0.05
  FC_up <- 2
  FC_down <- 0.5
  Expre <- 10
} else if (Cluster == 'Cluster 6'){
  p_thre <- 0.05
  FC_up <- 5
  FC_down <- 0.5
  Expre <- 20
}


# --- conduct enrichment analysis ---
fig <- enrich(df.pathlevel_p, SubDataset, dataset, # top 15 will be depicted 
              Cluster,
              p_thre, FC_up, FC_down, Expre,
              group_sample.data, 
              annotation.level = 'Definition')
## [1] "File ./data/ZJ_12samples_metatranscriptomics/output_data/R/Blue_vs_White_Cluster 3_Enrichment.csv Exist."
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Warning: Removed 3 rows containing missing values (`geom_text_repel()`).
fig
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Removed 3 rows containing missing values (`geom_text_repel()`).

  • Enrichment analysis revealed that aldehyde dehydrogenase and inorganic phosphate transporter are two of the most significant genes up-regulated blue light.

APP3: Gene Panel comparison

  • It can be clearly observed that cluster 3 was significantly up-regulated, as well as presenting high expression levels under blue light irradiation. Thus, gene panel 3 is our interested functional panel and further analysis will be conducted on it.
# setting parameters
source('./scr/R/AppFunctions.R')
dataset = 'ZJ_12samples_metatranscriptomics'
group = 'Blue_vs_White'
SubDataset <- 'Blue_vs_White'

p <- GPsEnrichCompare(dataset, group, SubDataset, k=7) 
## Warning: ggrepel: 216 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
p
## Warning: ggrepel: 229 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

APP4: Topology Network with landmark genes regulatory strategy

# setting parameters
source('./scr/R/AppFunctions.R')
dataset = 'ZJ_12samples_metatranscriptomics'
group = 'Blue_vs_White'
SubDataset <- 'Blue_vs_White'
# *set basic data for statistic parameters calculations
ref_group <- 'White'
group_sample <- 'group_Blue'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
# *set mode of selecting cluster for topological analysis, i.e. mannually set cluster or set the targeted genes first, then search which cluster the genes subjected to. 
Mode <- 'Manu' # another mode is: 'Manu'

# --- gene panel setting ---
# *setting for gene searching: Obtain the targeted genes and corresponding gene panel

if (Mode == 'Search'){
  # method 1: search the gene panel that interested gene or pathway assigned to 
  interested.gene <- 'nitrate'
  interested.pathway <- 'Phototransduction'
  SearchGene <- TRUE
  SearchPathway <- FALSE
  # option: level3_pathway_name, ko_des, SwissProt_Description
  Cluster <- Search_GePa(var_list, SearchGene=TRUE, SearchPathway=FALSE)
} else if (Mode == 'Manu'){
  # method 2: manually set interested gene panel (cluster)
  Cluster <- 'Cluster 3' # HGP
}

# === obtain correlation matrix: occor.r ===
occor.r <- ObtainCorNet(dataset, SubDataset, Cluster, group, group_sample.data, gene_name=FALSE, thre.p = 0.05, thre.r = 0.8, annotation.level='Definition')
## [1] "File ./data/ZJ_12samples_metatranscriptomics/output_data/R/Blue_vs_White_Cluster 3_p0.05_r0.8_CorNet.csv exists."
library(igraph)
# parameter setting*
ExploreDegree = FALSE # whether or not visualize the exploratory analysis on nodes degree
ExploreNeibor = FALSE # whether or not visualize the exploratory analysis on nodes neiboring information 
adjacency_weight <- occor.r
igraph <- graph_from_adjacency_matrix(as.matrix(adjacency_weight), mode = 'undirected', weighted = TRUE, diag = FALSE)

# === obtain graph: graph_list ===
graph_list <- ObtainGraph(dataset, SubDataset, Cluster, group, group_sample.data, igraph, gene_name=TRUE, thre.p = 0.05, thre.r = 0.8)
## [1] "The total nodes number is: 35"
# --- obtain landmark genes ----
landmark.ordered <- obtain_landmark(graph_list, dataset, group, SubDataset,
                            Cluster, group_sample.data, annotation.level='Definition')
## [1] "File ./data/ZJ_12samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Whitelabeled.csv exists."
landmark.top5 <- landmark.ordered$Gene[1:6]
landmark.top5.define <- landmark.ordered$Definition[1:6]

print(paste('The top 5 landmark genes of', Cluster, 'is:'))
## [1] "The top 5 landmark genes of Cluster 3 is:"
print(landmark.top5)
## [1] "kbe:J4771_10640"    "kbe:J4771_05795"    "ctak:4412677_01929"
## [4] "kbe:J4771_07335"    "ccas:EIB73_06150"   "kbe:J4771_06880"
print('Their corresponding function definition are:')
## [1] "Their corresponding function definition are:"
print(landmark.top5.define)
## [1] "small subunit ribosomal protein S20" "small subunit ribosomal protein S16"
## [3] "large subunit ribosomal protein L17" "small subunit ribosomal protein S11"
## [5] "DNA-binding protein HU-beta"         "large subunit ribosomal protein L1"
[1] "The top 5 landmark genes of Cluster 3 is:"
[1] "kbe:J4771_10640"    "kbe:J4771_05795"    "ctak:4412677_01929" "kbe:J4771_07335"    "ccas:EIB73_06150"    "kbe:J4771_06880"   
[1] "Their corresponding function definition are:"
[1] "small subunit ribosomal protein S20" "small subunit ribosomal protein S16" "large subunit ribosomal protein L17"
[4] "small subunit ribosomal protein S11" "DNA-binding protein HU-beta"   "large subunit ribosomal protein L1" 
  • Under blue light illumination, the transcriptomic activities, especially rRNA, were activated.

DEMO3: Blue light regulated denitrifier based on metagenomics

APP1: 2D-Visualization of gene panels (cluster)

source('./scr/R/AppFunctions.R')
# --- load functional genes ---
library(dplyr)

# setting parameters
dataset = 'LY_15samples_metagenomics'
group = 'Blue_vs_Dark'
SubDataset <- 'Blue_vs_Dark'

var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/LY_15samples_metagenomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
p <- Clustering(dataset, group, SubDataset)
## [1] "File ./data/LY_15samples_metagenomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
p

[1] "The total gene number of Cluster 1 is: 19"
[1] "The total gene number of Cluster 2 is: 10"
[1] "The total gene number of Cluster 3 is: 7"
[1] "The total gene number of Cluster 4 is: 20"
[1] "The total gene number of Cluster 5 is: 8"
[1] "The total gene number of Cluster 6 is: 11"
[1] "The total gene number of Cluster 7 is: 13"
# setting parameters
source('./scr/R/AppFunctions.R')
dataset = 'LY_15samples_metagenomics'
group = 'Blue_vs_Dark'
SubDataset <- 'Blue_vs_Dark'

var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/LY_15samples_metagenomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
GeneSet_bycluster <- list('Cluster 1', 'Cluster 2', 'Cluster 3', 'Cluster 4', 'Cluster 5', 'Cluster 6', 'Cluster 7')
# GeneSet_byswiss <- list(c('nitrate', 'nitrite'))
# GeneSet_bykegg <- list(c('Phototransduction'))

# Annotation(dataset, group, SubDataset, GeneSet_bykegg,
#            GeneSet_byswiss=0, 
#            GeneSet_bycluster=0)
Annotation(dataset, group, SubDataset, GeneSet_bykegg=0,
           GeneSet_byswiss=0, 
           GeneSet_bycluster)
## [1] "File ./data/LY_15samples_metagenomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
## [1] "The total gene number of Cluster 1 is: 24"
## [1] "The total gene number of Cluster 2 is: 6"
## [1] "The total gene number of Cluster 3 is: 19"
## [1] "The total gene number of Cluster 4 is: 10"
## [1] "The total gene number of Cluster 5 is: 18"
## [1] "The total gene number of Cluster 6 is: 7"
## [1] "The total gene number of Cluster 7 is: 4"

APP2: Panel enrichment analysis

  • ‘group_blue.csv’ need to be manually created
# rm(list=ls()) 
# --- setting parameters ---
# *dataset
source('./scr/R/AppFunctions.R')
dataset = 'LY_15samples_metagenomics'
group = 'Blue_vs_Dark'
SubDataset <- 'Blue_vs_Dark'
Cluster = 'Cluster 4' 

# *set basic data for p-value and other statistic parameters calculations
ref_group <- 'Dark'
group_sample <- 'group_Blue'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
annotation.level <- 'level3_pathway_name'
# === pvalue and statistic analysis ===
rawdata <- obtain_pathdata(dataset, group, SubDataset, 
                            Cluster)

# *setting for visualization app
fig.name <- paste(Cluster, ' of ', group_sample, sep = '')
  
# --- obtain expression level by pathway ---
library(dplyr)
pathway <- rawdata %>%
  group_by(level3_pathway_name) %>%
  summarise(Dark1mean = mean(D1A),
            Dark2mean = mean(D2A),
            Dark3mean = mean(D3A),
            Blue1mean = mean(B1A),
            Blue2mean = mean(B2A),
            Blue3mean = mean(B3A))

# --- obtain p-value ---
df.pathlevel_p <- obtain_pvalue(dataset, ref_group, group_sample.data,  pathway, annotation.level)
## Warning in apply(temp[, 1:(length(temp))], 2, as.numeric): NAs introduced by
## coercion
# --- obtain other statistic parameters ---
df.pathlevel_p <- ObtainStatistic(pathway, df.pathlevel_p, group_sample.data)

# *set threshod for filtering
if (Cluster == 'Cluster 4'){
  p_thre <- 0.2
  FC_up <- 1
  FC_down <- 1
  Expre <- 0
} else if (Cluster == 'Cluster 5'){
  p_thre <- 0.2
  FC_up <- 1
  FC_down <- 1
  Expre <- 0
} else if (Cluster == 'Cluster 7'){
  p_thre <- 0.2
  FC_up <- 1
  FC_down <- 1
  Expre <- 0
} else if (Cluster == 'Cluster 2'){
  p_thre <- 0.2
  FC_up <- 1
  FC_down <- 1
  Expre <- 0
} else if (Cluster == 'Cluster 3'){
  p_thre <- 0.2
  FC_up <- 1
  FC_down <- 0.5
  Expre <- 0
} else if (Cluster == 'Cluster 1'){
  p_thre <- 0.2
  FC_up <- 2
  FC_down <- 1
  Expre <- 0
} else if (Cluster == 'Cluster 6'){
  p_thre <- 0.2
  FC_up <- 1
  FC_down <- 1
}


# --- conduct enrichment analysis ---
fig <- enrich(df.pathlevel_p, SubDataset, dataset, Cluster,
                   p_thre, FC_up, FC_down, Expre,
                   group_sample.data)
## [1] "File ./data/LY_15samples_metagenomics/output_data/R/Blue_vs_Dark_Cluster 4_Enrichment.csv Exist."
fig

The above gene panel enrichment on cluster 4 unveiled that blue light mainly contribute to activating signals transduction through two-component system. This is also consistent with the results of metatranscriptomic analysis.

APP3: Gene Panel comparison

# setting parameters
source('./scr/R/AppFunctions.R')

dataset = 'LY_15samples_metagenomics'
group = 'Blue_vs_Dark'
SubDataset <- 'Blue_vs_Dark'

p <- GPsEnrichCompare(dataset, group, SubDataset, k=4) 
p

APP4: Topology Network with landmark genes for developing regulatory strategy

# setting parameters
source('./scr/R/AppFunctions.R')
dataset = 'LY_15samples_metagenomics'
group = 'Blue_vs_Dark'
SubDataset <- 'Blue_vs_Dark'
# *set basic data for statistic parameters calculations
ref_group <- 'Dark'
group_sample <- 'group_Blue'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
# *set mode of selecting cluster for topological analysis, i.e. mannually set cluster or set the targeted genes first, then search which cluster the genes subjected to. 
Mode <- 'Manu' # another mode is: 'Manu'

# --- gene panel setting ---
# *setting for gene searching: Obtain the targeted genes and corresponding gene panel

if (Mode == 'Search'){
  # method 1: search the gene panel that interested gene or pathway assigned to 
  interested.gene <- 'nitrate'
  interested.pathway <- 'Phototransduction'
  SearchGene <- TRUE
  SearchPathway <- FALSE
  # option: level3_pathway_name, ko_des, SwissProt_Description
  Cluster <- Search_GePa(var_list, SearchGene=TRUE, SearchPathway=FALSE)
} else if (Mode == 'Manu'){
  # method 2: manually set interested gene panel (cluster)
  Cluster <- 'Cluster 4' # HGP
}

# === obtain correlation matrix: occor.r ===
occor.r <- ObtainCorNet(dataset, SubDataset, Cluster, group, group_sample.data, gene_name=FALSE, thre.p = 0.05, thre.r = 0.8)
## [1] "File ./data/LY_15samples_metagenomics/output_data/R/Blue_vs_Dark_Cluster 4_p0.05_r0.8_CorNet.csv exists."
library(igraph)
# parameter setting*
ExploreDegree = FALSE # whether or not visualize the exploratory analysis on nodes degree
ExploreNeibor = FALSE # whether or not visualize the exploratory analysis on nodes neiboring information 
adjacency_weight <- occor.r
igraph <- graph_from_adjacency_matrix(as.matrix(adjacency_weight), mode = 'undirected', weighted = TRUE, diag = FALSE)

# === obtain graph: graph_list ===
graph_list <- ObtainGraph(dataset, SubDataset, Cluster, group, group_sample.data, igraph, gene_name=TRUE, thre.p = 0.05, thre.r = 0.8)
## [1] "The total nodes number is: 9"
# --- obtain landmark genes ----
landmark.ordered <- obtain_landmark(graph_list, dataset, group, SubDataset,
                            Cluster, group_sample.data)
## [1] "File ./data/LY_15samples_metagenomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
landmark.top5 <- landmark.ordered$ko_name[1:5]

print(paste('The top 5 landmark genes of', Cluster, 'is:'))
## [1] "The top 5 landmark genes of Cluster 4 is:"
print(landmark.top5)
## [1] "mlq:ASQ50_10875" "rbg:BG454_08250" "-"               "psa:PST_0389"   
## [5] "cbaa:SRAA_0573"
  • Thus, the landmark genes of cluster 4 is: “mlq:ASQ50_10875” “rbg:BG454_08250” “-” “psa:PST_0389” [5] “cbaa:SRAA_0573”

DEMO4: Engineering case

APP1: 2D-Visualization of gene panels (cluster)

  • Gene panel visualization
rm(list=ls()) 
source('./scr/R/AppFunctions.R')
# --- load functional genes ---
library(dplyr)

# setting parameters
dataset='JS_18samples_metagenomics' # dataset 
group='Denitrification_vs_Nitritation' # subdataset 
SubDataset <- 'Denitrification_vs_Nitritation' 

var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/JS_18samples_metagenomics/inputdata/graphdata_Denitrification_vs_Nitritationlabeled.csv exists."
p <- Clustering(dataset, group, SubDataset)
## [1] "File ./data/JS_18samples_metagenomics/inputdata/graphdata_Denitrification_vs_Nitritationlabeled.csv exists."
p

  • Annotation
# --- load functional genes ---
source('./scr/R/AppFunctions.R')
# --- load functional genes ---
library(dplyr)

# setting parameters
dataset='JS_18samples_metagenomics' # dataset 
group='Denitrification_vs_Nitritation' # subdataset 
SubDataset <- 'Denitrification_vs_Nitritation' 

var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/JS_18samples_metagenomics/inputdata/graphdata_Denitrification_vs_Nitritationlabeled.csv exists."
GeneSet_bycluster <- list('Cluster 1', 'Cluster 2','Cluster 3', 'Cluster 4', 'Cluster 5', 'Cluster 6', 'Cluster 7') # annotate gene panel information of gemes 
GeneSet_byswiss <- list(c('nitrate', 'nitrite')) # annotate nitrate and nitrite related genes based on Swiss-Prot database.
GeneSet_bykegg <- list(c('Phototransduction')) # annotate interested pathways, here the phototransuction pathway, based on KEGG database.

# Annotation(dataset, group, SubDataset, GeneSet_bykegg,
#            GeneSet_byswiss=0, 
#            GeneSet_bycluster=0)
Annotation(dataset, group, SubDataset, GeneSet_bykegg=0,
           GeneSet_byswiss=0, 
           GeneSet_bycluster)
## [1] "File ./data/JS_18samples_metagenomics/inputdata/graphdata_Denitrification_vs_Nitritationlabeled.csv exists."
## [1] "The total gene number of Cluster 1 is: 187"
## [1] "The total gene number of Cluster 2 is: 10"
## [1] "The total gene number of Cluster 3 is: 0"
## [1] "The total gene number of Cluster 4 is: 57"
## [1] "The total gene number of Cluster 5 is: 0"
## [1] "The total gene number of Cluster 6 is: 16"
## [1] "The total gene number of Cluster 7 is: 7"
  • The annotated genes number of different gene panels are as follows:
[1] "The total gene number of Cluster 1 is: 187"
[1] "The total gene number of Cluster 2 is: 10"
[1] "The total gene number of Cluster 3 is: 0"
[1] "The total gene number of Cluster 4 is: 57"
[1] "The total gene number of Cluster 5 is: 0"
[1] "The total gene number of Cluster 6 is: 16"
[1] "The total gene number of Cluster 7 is: 7"

APP2: Panel enrichment analysis

# rm(list=ls()) 
# --- setting parameters ---
# *dataset
source('./scr/R/AppFunctions.R')
# setting parameters
dataset='JS_18samples_metagenomics' # dataset 
group='Denitrification_vs_Nitritation' # subdataset 
SubDataset <- 'Denitrification_vs_Nitritation' 
Cluster = 'Cluster 1' 

# *set basic data for p-value and other statistic parameters calculations
ref_group <- 'Nitritation'
group_sample <- 'group_Denitrification'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
annotation.level <- 'level3_pathway_name'
# === pvalue and statistic analysis ===
rawdata <- obtain_pathdata(dataset, group, SubDataset, 
                            Cluster)

# *setting for visualization app
fig.name <- paste(Cluster, ' of ', group_sample, sep = '')
  
# --- obtain expression level by pathway ---
library(dplyr)
pathway <- rawdata %>%
  group_by(level3_pathway_name) %>%
  summarise(Nitritation1mean = mean(Nitritation_1),
            Nitritation2mean = mean(Nitritation_2),
            Nitritation3mean = mean(Nitritation_3),
            Denitrification1mean = mean(Denitrification_1),
            Denitrification2mean = mean(Denitrification_2),
            Denitrification3mean = mean(Denitrification_3))

# --- obtain p-value ---
df.pathlevel_p <- obtain_pvalue(dataset, ref_group, group_sample.data,  pathway, annotation.level)
## Warning in apply(temp[, 1:(length(temp))], 2, as.numeric): NAs introduced by
## coercion
# --- obtain other statistic parameters ---
df.pathlevel_p <- ObtainStatistic(pathway, df.pathlevel_p, group_sample.data)

# *set threshod for filtering
if (Cluster == 'Cluster 4'){
  p_thre <- 0.2
  FC_up <- 1
  FC_down <- 1
  Expre <- 0
} else if (Cluster == 'Cluster 5'){
  p_thre <- 0.2
  FC_up <- 1
  FC_down <- 1
  Expre <- 0
} else if (Cluster == 'Cluster 7'){
  p_thre <- 0.2
  FC_up <- 1
  FC_down <- 1
  Expre <- 0
} else if (Cluster == 'Cluster 2'){
  p_thre <- 0.2
  FC_up <- 1
  FC_down <- 1
  Expre <- 0
} else if (Cluster == 'Cluster 3'){
  p_thre <- 0.2
  FC_up <- 1
  FC_down <- 0.5
  Expre <- 0
} else if (Cluster == 'Cluster 1'){
  p_thre <- 0.2
  FC_up <- 2
  FC_down <- 1
  Expre <- 0
} else if (Cluster == 'Cluster 6'){
  p_thre <- 0.2
  FC_up <- 1
  FC_down <- 1
}


# --- conduct enrichment analysis ---
fig <- enrich(df.pathlevel_p, SubDataset, dataset, Cluster,
                   p_thre, FC_up, FC_down, Expre,
                   group_sample.data, annotation.level)
## [1] "File ./data/JS_18samples_metagenomics/output_data/R/Denitrification_vs_Nitritation_Cluster 1_Enrichment.csv Exist."
fig

  • From the gene panel enrichment results, it can be seen that denitrification microbiome sampled from practical engineering tend to synthesis more EPS, including lipopolysaccharide and exopolysaccharide, which was regulated by two-component system. This is also consistent well with the batch experiments results.

APP3: Gene Panel comparison

# setting parameters
source('./scr/R/AppFunctions.R')

dataset='JS_18samples_metagenomics' # dataset 
group='Denitrification_vs_Nitritation' # subdataset 
SubDataset <- 'Denitrification_vs_Nitritation' 

p <- GPsEnrichCompare(dataset, group, SubDataset, k=2) 
## Warning: ggrepel: 22 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
p
## Warning: ggrepel: 37 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

APP4: Topology Network with landmark genes for developing regulatory strategy

# setting parameters
source('./scr/R/AppFunctions.R')
dataset='JS_18samples_metagenomics' # dataset 
group='Denitrification_vs_Nitritation' # subdataset 
SubDataset <- 'Denitrification_vs_Nitritation' 
# *set basic data for statistic parameters calculations
ref_group <- 'Nitritation'
group_sample <- 'group_Denitrification'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
# *set mode of selecting cluster for topological analysis, i.e. mannually set cluster or set the targeted genes first, then search which cluster the genes subjected to. 
Mode <- 'Manu' # another mode is: 'Manu'

# --- gene panel setting ---
# *setting for gene searching: Obtain the targeted genes and corresponding gene panel

if (Mode == 'Search'){
  # method 1: search the gene panel that interested gene or pathway assigned to 
  interested.gene <- 'nitrate'
  interested.pathway <- 'Phototransduction'
  SearchGene <- TRUE
  SearchPathway <- FALSE
  # option: level3_pathway_name, ko_des, SwissProt_Description
  Cluster <- Search_GePa(var_list, SearchGene=TRUE, SearchPathway=FALSE)
} else if (Mode == 'Manu'){
  # method 2: manually set interested gene panel (cluster)
  Cluster <- 'Cluster 1' # HGP
}

# === obtain correlation matrix: occor.r ===
occor.r <- ObtainCorNet(dataset, SubDataset, Cluster, group, group_sample.data, gene_name=FALSE, thre.p = 0.05, thre.r = 0.8)
## [1] "File ./data/JS_18samples_metagenomics/output_data/R/Denitrification_vs_Nitritation_Cluster 1_p0.05_r0.8_CorNet.csv exists."
library(igraph)
# parameter setting*
ExploreDegree = FALSE # whether or not visualize the exploratory analysis on nodes degree
ExploreNeibor = FALSE # whether or not visualize the exploratory analysis on nodes neiboring information 
adjacency_weight <- occor.r
igraph <- graph_from_adjacency_matrix(as.matrix(adjacency_weight), mode = 'undirected', weighted = TRUE, diag = FALSE)

# === obtain graph: graph_list ===
graph_list <- ObtainGraph(dataset, SubDataset, Cluster, group, group_sample.data, igraph, gene_name=TRUE, thre.p = 0.05, thre.r = 0.8)
## [1] "The total nodes number is: 97"
# --- obtain landmark genes ----
landmark.ordered <- obtain_landmark(graph_list, dataset, group, SubDataset,
                            Cluster, group_sample.data)
## [1] "File ./data/JS_18samples_metagenomics/inputdata/graphdata_Denitrification_vs_Nitritationlabeled.csv exists."
landmark.top5 <- landmark.ordered$ko_name[1:5]

print(paste('The top 5 landmark genes of', Cluster, 'is:'))
## [1] "The top 5 landmark genes of Cluster 1 is:"
print(landmark.top5)
## [1] "pilN"      "pilP"      "E6.4.1.4B" "cysE"      "gspH"
[1] "The top 5 landmark genes of Cluster 1 is:"
[1] "pilN"      "pilP"      "E6.4.1.4B" "cysE"      "gspH"
  • As revealed above on the practical engineering projects’ denitrification microbiome, the top 5 landmark genes of gene panel 1 is: “pilN”, “pilP”, “E6.4.1.4B”, “cysE” and “gspH”